81 research outputs found

    Propositional Satisfiability Method in Rough Classification Modeling for Data Mining

    Get PDF
    The fundamental problem in data mining is whether the whole information available is always necessary to represent the information system (IS). The goal of data mining is to find rules that model the world sufficiently well. These rules consist of conditions over attributes value pairs called description and classification of decision attribute. However, the set of all decision rules generated from all conditional attributes can be too large and can contain many chaotic rules that are not appropriate for unseen object classification. Therefore the search for the best rules must be performed because it is not possible to determine the quality of all rules generated from the information systems. In rough set approach to data mining, the set of interesting rules are determined using a notion of reduct. Rules were generated from reducts through binding the condition attribute values of the object class from which the reduct is originated to the corresponding attribute. It is important for the reducts to be minimum in size. The minimal reducts will decrease the size of the conditional attributes used to generate rules. Smaller size of rules are expected to classify new cases more properly because of the larger support in data and in some sense the most stable and frequently appearing reducts gives the best decision rules. The main work of the thesis is the generation of classification model that contains smaller number of rules, shorter length and good accuracy. The propositional satisfiability method in rough classification model is proposed in this thesis. Two models, Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) to represent the minimal reduct computation problem were proposed. The models involved a theoretical formalism of the discemibility relation of a decision system (DS) into an Integer Programming (IP) model. The proposed models were embedded within the default rules generation framework and a new rough classification method was obtained. An improved branch and bound strategy is proposed to solve the SIP and DRIP models that pruned certain amount of search. The proposed strategy used the conflict analysis procedure to remove the unnecessary attribute assignments and determined the branch level for the search to backtrack in a nonchronological manner. Five data sets from VCI machine learning repositories and domain theories were experimented. Total number rules generated for the best classification model is recorded where the 30% of data were used for training and 70% were kept as test data. The classification accuracy, the number of rules and the maximum length of rules obtained from the SIPIDRIP method was compared with other rough set method such as Genetic Algorithm (GA), Johnson, Holte l R, Dynamic and Exhaustive method. Four of the datasets were then chosen for further experiment. The improved search strategy implemented the non-chronological backtracking search that potentially prunes the large portion of search space. The experimental results showed that the proposed SIPIDRIP method is a successful method in rough classification modeling. The outstanding feature of this method is the reduced number of rules in all classification models. SIPIDRIP generated shorter rules among other methods in most dataset. The proposed search strategy indicated that the best performance can be achieved at the lower level or shorter path of the tree search. SIPIDRIP method had also shown promising across other commonly used classifiers such as neural network and statistical method. This model is expected to be able to represent the knowledge of the system efficiently

    Data Preprocessing: Case Study on monthly number of visitors to Taiwan by their residence and purpose

    Get PDF
    This paper will explain in details on data reports preliminary on dataset, how the pre-processing data mainly for data cleaning and reduction process applied to a dataset. The dataset that will be used is number of visitors to Taiwan by their residence and purpose.Dataset which is obtained based on kaggle,  findings from Scraped from Taiwan Tourism Bureau. The surveys have been carried out using Foreign visitor data covers all foreign visitors directly arrived in Taiwan through the airports, ports and land

    Indicator selection based on Rough Set Theory

    Get PDF
    A method for indicator selection is proposed in this paper.The method, which adopts the General Methodology and Design Research approach, consists of four steps: Problem Identification, Requirement Gathering, Indicator Extraction, and Evaluation. Rough Set approach also has been applied in the Indicator Extraction phase.This phase consists of 5 steps: Data selection, Data Preprocessing, Discretization, Split Data, Reduction, and Classification.A dataset of 427 records have been used for experimentation.The datasets which contains financial information from several companies consists of 30 dependant indicators and one independent indicator.The selection of indicators is based on rough set theory where sets of reducts are computed from a dataset.Based on the sets of reducts, indicators have been ranked and selected based on certain set of criteria.Indicators have been ranked through computation of frequencies in reduct sets.The major contribution of this work is the extraction method for identifying reduced indicators.Results obtained have shown competitive accuracies in classifying new cases, thus showing that the quality of knowledge is maintained through the use of a reduced set of indicators

    A comparative study of deep learning algorithms in univariate and multivariate forecasting of the Malaysian stock market

    Get PDF
    As part of a financial institution, the stock market has been an essential factor in the growth and stability of the national economy. Investment in the stock market is risky because of its price complexity and unpredictable nature. Deep learning is an emerging approach in stock market prediction modeling that can learn the non-linearity and complexity of stock market data. To date, not much study on stock market prediction in Malaysia employs the deep learning prediction model, especially in handling univariate and multivariate data. This study aims to develop a univariate and multivariate stock market forecasting model using three deep learning algorithms and compare the performance of those models. The algorithm intends to predict the close price of the Malaysian stock market using the Axiata Group Berhad and Petronas Gas Berhad from Bursa Malaysia, listed in Kuala Lumpur Composite Index (KLCI) datasets. Three deep learning algorithms, Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), are used to develop the prediction model. The deep learning models achieved the highest accuracy and outperformed the baseline models in short and long-term forecasts. It also shows that LSTM possessed the best deep learning model for the Malaysian stock market, achieving the lowest prediction error among the other models. Deep learning demonstrates the ability to handle univariate and multivariate data in preserving important information, thus forecasting the stock market. This finding is relatively significant as deep learning works well with high-dimensional datasets

    Multi layer perception modelling in the housing market

    Get PDF
    The study examines the use of multi layer perceptron network (MLP) in predicting the price of terrace houses in Kuala Lumpur (KL). Nine factors that significantly influence the price were used in this attempt. Housing data from 1994 to 1996 were presented to the network for training. Tested results from the model obtained for various years were compared using regression analysis. The study provides the predictive ability of the trained MLP model that can be used as an alternative predictor in real estate analysis

    An improved artificial dendrite cell algorithm for abnormal signal detection

    Get PDF
    In dendrite cell algorithm (DCA), the abnormality of a data point is determined by comparing the multi-context antigen value (MCAV) with anomaly threshold. The limitation of the existing threshold is that the value needs to be determined before mining based on previous information and the existing MCAV is inefficient when exposed to extreme values. This causes the DCA fails to detect new data points if the pattern has distinct behavior from previous information and affects detection accuracy. This paper proposed an improved anomaly threshold solution for DCA using the statistical cumulative sum (CUSUM) with the aim to improve its detection capability. In the proposed approach, the MCAV were normalized with upper CUSUM and the new anomaly threshold was calculated during run time by considering the acceptance value and min MCAV. From the experiments towards 12 benchmark and two outbreak datasets, the improved DCA is proven to have a better detection result than its previous version in terms of sensitivity, specificity, false detection rate and accuracy

    Nonlinear regression in tax evasion with uncertainty: a variational approach

    Get PDF
    One of the major problems in today's economy is the phenomenon of tax evasion. The linear regression method is a solution to find a formula to investigate the effect of each variable in the final tax evasion rate. Since the tax evasion data in this study has a great degree of uncertainty and the relationship between variables is nonlinear, Bayesian method is used to address the uncertainty along with 6 nonlinear basis functions to tackle the nonlinearity problem. Furthermore, variational method is applied on Bayesian linear regression in tax evasion data to approximate the model evidence in Bayesian method. The dataset is collected from tax evasion in Malaysia in period from 1963 to 2013 with 8 input variables. Results from variational method are compared with Maximum Likelihood Estimation technique on Bayeisan linear regression and variational method provides more accurate prediction. This study suggests that, in order to reduce the tax evasion, Malaysian government should decrease direct tax and taxpayer income and increase indirect tax and government regulation variables by 5% in the small amount of changes (10%-30%) and reduce direct tax and income on taxpayer and increment indirect tax and government regulation variables by 90% in the large amount of changes (70%-90%) with respect to the current situation to reduce the final tax evasion rate

    An Affective Decision Making Engine Framework for Practical Software Agents

    Get PDF
    The framework of the Affective Decision Making Engine outlined here provides a blueprint for creating software agents that emulate psychological affect when making decisions in complex and dynamic problem environments. The influence of affect on the agent's decisions is mimicked by measuring the correlation of feature values, possessed by objects and/or events in the environment, against the outcome of goals that are set for measuring the agent's overall performance. The use of correlation in the Affective Decision Making Engine provides a statistical justification for preference when prioritizing goals, particularly when it is not possible to realize all agent goals. The simplification of the agent algorithm retains the function of affect for summarizing feature-rich dynamic environments during decision making. Keywords: Affective decision making, correlative adaptation, affective agent
    corecore